Rebuilding search indexes by using a direct 
connection to the database (using PRPC Utils 


script 
Change History: 
Versi Date Created By Comment 
on 
v1 12/05/2020 Shourav Ushank Initial Creation 


Artefacts Needed: 


e scripts folder from the Pega installation bundle - this folder contains the utils 
directory and other essential files required to run the process 


utils directory contents: 


im vtable File folder 
3 connection.properties PROPERTIES File 
1| prbootstrap.properties PROPERTIES File Files to be modified 
| prconfig.xml XML Document in utils 
[Œ] prpcServiceUtils.bat Windows Batch File 
| prpcServiceUtils.properties PROPERTIES File prpcUtils.properties 
| prpcServiceUtils.sh SH File prpcutils.sh 
| prpcServiceUtilsWrapper.xml XML Document prbootstrap.properties 
[Œ] prpcuUtils.bat Windows Batch File (optional) P 
) prpcUtils.properties PROPERTIES File prconfig.xml (optional) 
IO prpcUtils.sh SH File 
_| prpcUtils.xml XML Document 
| prpcUtilsWrapper.xml XML Document 
=| README.txt Text Document 
3 serviceConnection.properties PROPERTIES File 
E staticassembler.zip Fichier WinZip 


e archives folder from Pega installation bundle - this contains the following files 
o pegadbinstall-classes.zip 
o AssembledjavaClasses.zip 
o prweb.war 

e lib folder - this folder to be created and the ojdbc jar needs to be copied. 


Put the above folders in a named folder say “FTSIndex” in any node’s PRPC directory in 
the cluster. 


Pre-Requisites on the application front to start indexing: 


e Ensure the FTSIncrementallndexer agent is stopped on all nodes in the cluster. 
e Ensure there is no indexing process (from UI or script) being run on any node ina 
multi-node cluster system already. 
e Verify below on the search landing page (Default tab) from a node in the cluster 
(preferably non-search node) 
o Enable search indexing is ON 


o Indexing is enabled for Rule, Data and Work and /ndex work attachments is 
unchecked 

o Delete the current entry(ies) under the Search Index host node setting and save 
the settings. Refresh to confirm that settings are correctly set. 

o The status should be shown as UNAVAILABLE and the Primary and Total size 
should be shown as 0. 


Search indexing 


Turn on/off search indexing for all classes 


Default indexes 
Name Status Primary size Total size 
© All rules umavanamu | 
@ All data umavasama | 
# Ali work co 


ndex work attachments 


e Ensure that RuleForm in the advanced tab of the Class Definition is set, for the class 
that requires re-indexing (For CMT SG-Data-Email-Indexed and all Work classes - it is 
currently set so can be skipped ) 


Ruleform 


Type 
Harness ¥ 


and 
RuleForm| © 


e Ensure that the Full Text Search configuration - Exclude this class from search is 
unchecked for all classes requiring re-indexing (for Data classes it is only done for SG- 
Data-Email-Indexed class) 

Full text search 
Exclude this class from search 
e Ensure that the PRPC mount contains sufficient disk space, and the database has 


sufficient tablespace required for expansion during indexing process 
e Ensure the node(s) chosen to host Index Directory(ies) are of type “Search” 


Pre-Requisites on the server front to start indexing: 


e Place the FTSIndex directory with the “archives”,”lib” and “scripts” folder under the 
PRPC directory of the node from which we intend to run indexing. 


e Update the files prpcutils.properties, prpcUtils.sh, prbootstrap.properties and 
prconfig.xml under the utils directory. This is discussed in detail in the “File 
Modifications” section later. 

e Create a directory “utilstemp” under the PRPC directory (the node where the FTSIndex 
directory is placed) 

e Create a directory “PegaSearchIindex” (Pega recommended) under the PRPC 
directory, which would be the index directory on the intended primary node. For 
secondary/backup index directories create folder with the same name under PRPC 
directory of other backup nodes. 

Note: /f planning secondary nodes, it is advised to have an odd number of nodes 
(e.g. 1,3,5,7) depending on the size of cluster. 

e Ensure to have full “777” permissions on each of the above created directories 
(FTSIndex, utilstemp and PegaSearchIndex) and the contents. 

e Remember to clean-up or better create new directories whenever running a new batch 
of re-indexing. 


Note: All nodes should preferably on same physical server and the paths 
File Modifications: 
e prpcuUtils.sh 
Add the JAVA_HOME variable as below: 
export JAVA _HOME=<Compatible JDK path on the same server running the indexing> 
e.g export JAVA _HOME=/PEGDEVWEB017/home/webdev0l1/java7 


e prpcUtils.properties 


o Update Connection Information 


JDBC driver pega.jdbc.driver.jar= 

path /PEGUATWEBO09/PRPC/FTSIndex/lib/ojdbc7.jar 

JDBC driver pega.jdbc.driver.class=oracle.jdbc.OracleDriver 

class 

Database pega.database.type=oracledate 

Type 

JDBC URL pega.jdbc.url=jdbc:oracle:thin:@dbbdevdb7243.fr.world.socgen: 
1522/PEGZD100 

DB pega.jdbc.username= <Pega DB username with Full Access> 

Username 

DB Password | pega.jdbc.password=<Password of DB username> 


o Set custom prconfig and prbootstrap (Optional) 


pegarules.config=/PEGUATWEBO009/PRPC/FTSIndex/scripts/utils/prconfig.xml 
prbootstrap.config=/PEGUATWEBO09/PRPC/FTSIndex/scripts/utils/ 
prbootstrap.properties 


o Provide Pega applications username & password 


pega.user.username=CMTBIXUser 


pega.user.password=rules 


o Update DB schema names 


rules.schema.name=PEG_ RULES OWNER 
data.schema.name=PEG_DATA_OWNER 
customerdata.schema.name=PEG_ DATA_OWNER 


o Set user temp directory to be used by the process 
user.temp.dir=/PEGUATWEBO009/PRPC/utilstemp 


o Update settings for full text indexer tool 


WORK DATA RULE 
indexing.indextype= | indexing.indextype=Data indexing.indextype= 
Work indexing.classes=SG-Data- Rule 

Email-Indexed 


indexing.messagesfrequency=10 
indexing.indexdirectory=/PEGUATWEBO09/PRPC/PegaSearchIindex 


prbootstrap.properties 
Update the below properties: 


com.pega.pegarules.bootstrap.allclasses.docpsource=pas.oracle 
com.pega.pegarules.bootstrap.allclasses.schema=PEG RULES OWNER 
com.pega.pegarules.bootstrap.datatables.schema=PEG_ DATA OWNER 


#source database details 
pas.oracle.url=jdbc:oracle:thin:@dbbdevdb7243.fr.world.socgen:1522/PEGZD100 
pas.oracle.username=<DB username with full access> 
pas.oracle.password=<DB password for the username> 
oracle.jdbc.class=oracle.jdbc.OracleDriver 


com.pega.pegarules.bootstrap.tempdir=/PEGUATWEBO09/PRPC/utilstemp 


prconfig.xml 
Add the following settings: 
<env name="database/drivers" value="oracle.jdbc.OracleDriver" /> 


<env name="database/databases/PegaRULES/url" 
value="jdbc:oracle:thin:@dbbdevdb7243.fr.world.socgen:1522/PEGZD100" /> 


<env name="database/databases/PegaRULES/username" value="<username>" /> 
<env name="database/databases/PegaRULES/password" value="<password>" /> 


<env name="database/databases/PegaDATA/url" 
value="jdbc:oracle:thin:@dbbdevdb7243.fr.world.socgen:1522/PEGZD100" /> 


<env name="database/databases/PegaDATA/username" value="<username>" /> 


<env name="database/databases/PegaDATA/password" value="<password" /> 


Settings for running the indexer in multiple threads: 


Update the following DSS values to have the indexer process run parallelly on multiple 
worker threads (useful when have to run indexing on a very high volume of records e.g in 
PROD): 


indexing/distributed/batch/numworkers value: 3 {As many workers needed} 
indexing/distributed/batch/workqueuesize value: 20000 
indexing/distributed/batch/requestbatchsize value: 1000 


Note: Running a re-index with these DSS values requires more resources (CPU and 
memory). 


The above values can also be set on the prconfig.xml 


Running the Indexer script 


> 


> 
> 


Log in to the command line utility (PUTTY) of the server having the FTSIndex directory 
with the owner user of all the directories 

Navigate to the utils directory where the script file - prpcUtils.sh is present 

Type the command ./prpcUtils.sh indexing into the command line (once each for 
Work, Data and Rule types) 


P peguatweb009.fr.world.socgen - PuTTY 


Logs of the process would be printed into the command line and log file - CLI-prpcUtils- 
log-<<DateTimeStamp>>.log present in the logs directory under utils directory 

The process may run from between 30 mins (0.3 million records) to 5 Hrs (1.5 million 
records). 

A successful run would have a message as below at the end 


Post completion of indexer script 


> After the indexing has finished running for Work, Data and Rule, open the search landing 


> 


page of any running node and add the host index node information as below and click 
“Save Settings” 

Refresh the search landing page, and see that the status for all index types is set to 
“Available” and have positive values for the sizes. 


Default indexes 


Name Status Primary size Total size Number of documents 


a Allrules | avaname| 2,015.53MB 2,015.53MB 732,640 
a Alldata | wucompcere | 83.77MB 83.77MB 27,227 
% All work | AVAILABLE | 2,210.03MB 2,210.03MB 1,299,994 


Index work attachments (5-0 MB maximum 


> If there are no secondary index directory to be set up, start FTSIncrementallndexer 
agent again. 


Setting up secondary nodes and index directories 


One the indexing process is completed and completing the “post completion steps”, add one 
secondary node at a time and save settings. 


After all secondary nodes are added, enable the FTSIncrementallndexer agent. On a 
multinode PROD system, enable this agent on all search nodes to distribute the load evenly. 


Troubleshooting 
Search Index file directory changes on restart 


> Make sure that the backward-compatibility Dynamic System Setting 
(DSS) indexing/explicitindexdir is not specified. This DSS can be removed as well. 
> Add the following JVM setting: 
e For permanent index host nodes: -Dindex.directory=/your_index_directory 
e For nodes that are never expected to host indexes, add the JVM argument, but do not 
specify any directory (leave the value empty). 
> After making the changes, it is a best practice to restart PRPC nodes to include the 
changes. 


Indexing directory is not overridding on search landing page 
> Stop all indexing nodes. 
> Delete the Rules, Data, and Work indexing directory contents. 


> Set the IP explicitly by adding the following setting to the prconfig.xml file and specifying 
the correct IP to use: 
<env name="indexing/distributed/network/host" value="<the IP address your server 
should use for internode communication>" /> 
Example: <env name="indexing/distributed/network/host" value="192.168.1.1" /> 


> Restart the indexing nodes. 


> Check System > Settings > Search > Search Index Host Node Setting to verify that all 
nodes are online. 


> Reindex the Data first (faster), then reindex Work, followed by reindexing the Rules. 
Order does matter. If Work will take longer to reindex than rules, Rules should be second 


in the reindex sequence. The impact to users trying to find work items may be delayed 
longer until both the reindexing of Data and Rules has finished. 


Check System > Settings > Search > Agent Information, the FTSIncrementallndexer 
queue size. 

This confirms that the issue was a lack of synchronization between the nodes and that the 
only way to fix it was to delete all the file contents first, truncate the table, and restart the 
nodes and the reindexing process. 


